Conservation detection dogs: A critical review of efficacy and methodology

Abstract Conservation detection dogs (CDD) use their exceptional olfactory abilities to assist a wide range of conservation projects through the detection of target specimens or species. CDD are generally quicker, can cover wider areas and find more samples than humans and other analytical tools. However, their efficacy varies between studies; methodological and procedural standardisation in the field is lacking. Considering the cost of deploying a CDD team and the limited financial resources within conservation, it is vital that their performance is quantified and reliable. This review aims to summarise what is currently known about the use of scent detection dogs in conservation and elucidate which factors affect efficacy. We describe the efficacy of CDD across species and situational contexts like training and fieldwork. Reported sensitivities (i.e. the proportion of target samples found out of total available) ranged from 23.8% to 100% and precision rates (i.e. proportion of alerts that are true positives) from 27% to 100%. CDD are consistently shown to be better than other techniques, but performance varies substantially across the literature. There is no consistent difference in efficacy between training, testing and fieldwork, hence we need to understand the factors affecting this. We highlight the key variables that can alter CDD performance. External effects include target odour, training methods, sample management, search methodology, environment and the CDD handler. Internal effects include dog breed, personality, diet, age and health. Unfortunately, much of the research fails to provide adequate information on the dogs, handlers, training, experience and samples. This results in an inability to determine precisely why an individual study has high or low efficacy. It is clear that CDDs can be effective and applied to possibly limitless conservation scenarios, but moving forward researchers must provide more consistent and detailed methodologies so that comparisons can be conducted, results are more easily replicated and progress can be made in standardising CDD work.


| INTRODUC TI ON
Domestic dogs (Canis lupus familiaris) have worked alongside humans for thousands of years, primarily used for hunting, guarding and even forensic work by the ancient Greeks (Bergström et al., 2020;Helton, 2009;MacKay et al., 2008;Otto et al., 2019;Shields & Austin, 2018;Whitehouse-Tedd et al., 2021).Even now, dogs support humans by assisting those with disabilities, herding livestock in agriculture, providing protection in law enforcement and military and utilising their sense of smell to find a vast range of substances (Otto et al., 2019;Whitehouse-Tedd et al., 2021;Woollett et al., 2013).
Within canine scent work, one of the most up and coming areas is that of conservation.

Dogs began working in conservation in the 1890s in New
Zealand when they supported efforts to translocate kiwis and kakapos away from areas inhabited by invasive predators (Hill & Hill, 1987).Since then, there has been an almost unlimited scope of their application.Conservation detection dogs (CDD) can perform a variety of tasks like searching for live or dead specimens, nests or burrows and residual scent from hair or urine (Bennett et al., 2022;Helton, 2009;Kokocińska-Kusiak et al., 2021;Woollett et al., 2013).
Additionally, scat surveys have been used for indicating animal presence particularly by using DNA analytical techniques like barcoding (i.e.species identification (Arnot et al., 1993)) and profiling (i.e.identification of an individual organism (Giardina, 2013)) located scats, especially when the scat of different species is visually indistinguishable (Bennett et al., 2022;Helton, 2009;MacKay et al., 2008).CDD use has been documented in 62 countries across over 480 biological species including terrestrial, avian and aquatic mammals, birds, reptiles, amphibians, insects, molluscs, fungi, bacteria and invasive plants (Grimm-Seyfarth et al., 2021).Seemingly, scent detection dogs have 'limitless potential' (Woollett et al., 2013, p. 261) and their application is restricted only by the 'human imagination' (Browne et al., 2006, p. 101).They are invaluable, especially during a time when biodiversity is deeply threatened and the risk of extinction faces many species (Ceballos et al., 2017;Díaz et al., 2019).
Given that most animals have olfactory capabilities for navigation and communication (Kokocińska-Kusiak et al., 2021), why are dogs used most frequently for conservation detection work rather than other species?A key factor is the sheer capacity of canine olfaction.Dogs have up to 250 million olfactory receptors, depending on breed, in comparison to five million in humans (MacKay et al., 2008;Woollett et al., 2013) and can detect odours at concentrations as low as one part per trillion whereas analytical instruments are restricted to parts per billion (Otto et al., 2019).This is due to the unique anatomy of the canine nasal organ and brain (Jenkins et al., 2018;Jezierski et al., 2016;Kokocińska-Kusiak et al., 2021;MacKay et al., 2008).However, rats, insects and pigs also have the ability to be trained to perform scent discrimination like CDD (Bijland et al., 2013;Cambau & Poljak, 2020;Oh et al., 2015;Teodoro-Morrison et al., 2014), so why are these species used less frequently?
For conservation work, trainability and capability in the field are required in addition to olfactory acuity (DeMatteo et al., 2019).
Canine domestication means that the species has been selected for sociability, motivation and flexibility of learning (Beebe et al., 2016;Helton, 2009;Otto et al., 2019); psychological traits necessary for conducting complex scent work alongside humans.Furthermore, most conservation work takes place outdoors for several hours in varied weather, topographical and vegetation conditions, meaning CDD must be able to traverse great distances, over extended periods of time, while manoeuvring through obstacles.As such, specific physical features are sought when selecting a dog: stamina, agility and resilience to temperature to name a few (Beebe et al., 2016;DeMatteo et al., 2019;Helton, 2009;Otto et al., 2019).These are characteristics seen in many dogs that are rarely found in smaller or less domesticated species.
CDD have been highly beneficial to conservation outcomes.
Their use is non-invasive which protects environmental and wildlife welfare and is preferable to capture-recapture methods (Browne et al., 2006;Grimm-Seyfarth & Klenke, 2018;Kerley, 2010;Richards, 2018).Across many circumstances, CDD are faster, can find more samples and cover greater distances during a survey than other methods (Browne et al., 2006;Grimm-Seyfarth & Klenke, 2018;Kerley, 2010;MacKay et al., 2008;Stanhope & Sloan, 2019).For example, Mathews et al. (2013) found that when comparing humans and CDD during searches for bat carcasses at wind turbine sites, CDD took on average 40 min to conduct a search versus humans taking 2 h and 46 min and CDD found 75% of targets versus humans finding 20%.Furthermore, using CDD can reduce sampling bias as they do not rely on visual information to find targets the way methods like human surveys and camera-trapping does.Therefore, CDD are more capable of finding obscured samples and those in visually less obvious places (Kerley, 2010;MacKay et al., 2008).Additionally, CDD can play the role of ambassador for conservation work through people's affinity towards dogs (Richards, 2018;Witherington et al., 2017).
However, like any detection tool, disadvantages must also be considered.CDD teams are expensive both in terms of time and money.It takes months, if not years, to train a CDD and its handler along with the monetary cost of training and maintaining the dog through transport, housing, food, etc. (Kerley, 2010;MacKay et al., 2008).Acquiring samples for training can be difficult both practically and legally depending on whether the target species is elusive, endangered, or invasive (Kerley, 2010;MacKay et al., 2008).
Moreover, despite generally high efficacy rates, substantial variation occurs (MacKay et al., 2008) which brings CDD reliability into question.Indeed, modern guidelines for conservation methods, such as 'What Works in Conservation 2021' by Sutherland et al. (2021) along with governmental protocols for target species searches (Thompson et al., 2020), do not include CDD despite their widespread use, which may be indicative of the concerns around their efficacy.Given that conservation suffers from underfunding (Cozzi et al., 2021;Cristescu et al., 2020), the tool used for a project must be worth the cost.
Hence, this review aims to answer the questions of how, why and to what extent does efficacy varies, as these must be understood to achieve the best results possible when using CDD.To do this, available CDD studies were searched for (n = 67) and analysed in light of these questions.A major difficulty facing CDD work is a lack of standardisation across the field (Bennett et al., 2020;Hayes et al., 2018;Johnen et al., 2017;Otto et al., 2019).This is despite efforts made to standardise procedures for the use of scent detection dogs in general (Furton et al., 2010), and for the testing and reporting of CDD work (Bennett et al., 2020;Johnen et al., 2017).
At present in CDD literature, terminology for analytical measures is inconsistent (Hayes et al., 2018;Johnen et al., 2017), sample sizes are small leading to low statistical power (Lazarowski et al., 2020;Whitehouse-Tedd et al., 2021) and according to a systematic review by Johnen et al. (2017), up to 70% of CDD studies report limited training details and almost 25% were considered to be poor quality.All these factors together greatly harm the field's reliability and replicability, which is key to verifying results and improving future research.By assessing efficacy and methodology, issues in the literature can be highlighted, thereby increasing understanding of best practices.In this review, the efficacy of CDD will be investigated across training, testing and operational searches and when searching for different target species.Once efficacy rates have been established, the factors affecting efficacy will be discussed along with how methodological problems may be contributing.

| Selection criteria
Key inclusion criterion informed which studies were selected for the review.The study must have included the use of CDD, whether that be in a laboratory or field environment which also needed to be stated.The performance of the CDD must have been assessed using some type of quantifiable measure.The study also needed to specify what species or at least what group of species the CDDs were trained and tasked with searching for.As this review aimed to assess and critique the efficacy and methodologies of CDD studies across the field, the quality of the chosen papers was not part of the selection criteria in order to fully reflect the current state of the field.

| Search
Literature searches took place during April 2022 using the Queen's University Belfast online library (https:// www.qub.ac.uk/ direc torat es/ Infor matio nServ ices/ TheLi brary/ ).Search terms used were 'conservation detection dogs' and 'ecology detection dogs'.In total, 67 studies were included in this review based on these searches.It is noted that a more in-depth search technique involving additional search terms and the use of further databases may have yielded more studies to be included, thus making this a limitation of this study.

| Data collection and analysis
Studies with information and results relevant to the selection criteria were collected for analysis and review.Firstly, studies were screened by title, then by abstract, whereby if inspection of the abstract suggested that the study would not meet the necessary criteria for inclusion, it was excluded.Data were extracted from the chosen studies; relating to methodologies (i.e.target scent(s), study setting, the use of blinding, whether comparisons were done with other methods, number of CDDs used and the operational experience of the CDD handler(s) and the results of the CDD performance (i.e.sensitivity and precision; Table 1)).

| EFFI C AC Y R ATE S ACROSS CONTE X TS
When assessing the efficacy of CDD, one must be consistent in which measures are considered to ensure as little bias as possible.
However, it can be unclear what a study is measuring and terms like 'detection rates' may be used without stating what they quantify regarding the search and dog performance (Hayes et al., 2018;Johnen et al., 2017).Bennett et al. (2020) recommend sensitivity (i.e.proportion of target samples found out of total available), and precision (i.e.proportion of alerts that are true positives), also known as 'reliability' or 'predictive positive value', as measures to be used for evaluating CDD performance.Sensitivity can investigate performance during training and testing which can then help predict the probability of detection during operational searches, as sensitivity in the field is difficult to ascertain without estimating the total number of targets in an area often using techniques with high margins of error like playback (Bolton et al., 2021).Precision aids in determining the ability of the CDD to distinguish and discriminate the target scent from other odours.Lazarowski et al. (2020) propose measuring sensitivity and 'specificity' (i.e.proportion of non-alerts that are true negatives) in tandem as key to scent detection work.However, they also acknowledge that specificity is often challenging to accurately measure due to the limitless number of distractor scents that may be available during field trials or operational searches, as well as the difficulty of ascertaining that the target scent is completely absent in a natural environment.As such for this review, sensitivity and precision will be the measures of focus (see Table 1).Of the studies reviewed, 46% stated sensitivity rates, though this rate rises to 79% if counting only studies where sensitivity could have been assessed i.e.
because the number of potential targets is known (n = 39), and 55% provide precision rates.
However, it must be acknowledged that in the field of CDD research and applications, one-to-one comparisons of efficacy rates between studies are made difficult by small sample sizes (Lazarowski et al., 2020;Whitehouse-Tedd et al., 2021) and lack of detail about the studies (Johnen et al., 2017).Indeed, of the 67 studies identified for this review, the sample sizes of CDDs ranged from 1 to 20, and 6% (n = 4) did not report this clearly (see Table 2).For the purposes of detecting statistically significant differences and effect sizes, these sample sizes are far too low.But in many ways, this is the nature of the field given the costs and practicalities associated with training and maintaining a CDD (Kerley, 2010;MacKay et al., 2008) and it is unlikely that future studies with vastly larger sample sizes will be produced any time soon.Therefore, there is a need to assess the current state of research in the field, comparing specific comparable measures where available (Table 1) while also acknowledging limitations.
Searching for bird species through scat, carcasses, or eggs has resulted in sensitivity rates between 62.6% and 100% with precision reported between 50% to 100% (Arnesen et al., 2020;Bolton et al., 2021;Fukuhara et al., 2022;Reynolds et al., 2021).However, the study by Arnesen et al. (2020), where dogs searched for rock ptarmigan (Lagopus muta) scat in lab conditions, had three dogs out of four perform no better than chance and none of the dogs or handlers had any previous experience of training for CDD work.
Regarding the 29% sensitivity in Chambers et al. (2015) for finding natural bat roosts, this was during a search for both natural bat roosts and suspended bags of guano where guano was the original trained target.This could have caused the CDD to have a preference for the guano samples (i.e. on which they had been imprinted and trained) over the bat roosts which were novel.Indeed, sensitivity was 79% on guano bags alone, and increased to 77% for finding bat roosts, when only searching for bat roosts not in the presence of guano bags.The concepts of using different samples in training versus testing, generalisation of CDD to non-trained targets and the effects of odour concentration in search performance are elaborated on further in the Training section.
For larger mammals, sensitivity rates during training and testing of between 23.8% for sheep remains and 93.3% for cheetah scat are reported (de Oliveira et al., 2012;Hansen & Winje, 2021;Hofmann et al., 2021;Reed et al., 2011) with Hofmann et al. (2021) demonstrating 100% precision on cheetah scat.Although 23.8% sensitivity for CDD seems low, this was compared to the 2.5% sensitivity of human TA B L E 1 Glossary of metrics used within the literature review.TA B L E 2 Summary of literature reviewed that investigates CDD efficacy rates.
Improvements in detection by even small proportions can be hugely beneficial as conservation projects often rely on methods with overall low detection rates (Mathews et al., 2013).These examples demonstrate little pattern regarding the target species when it comes to success during training and testing except for greater variation with mammal targets which could be due to a few things: an inherent issue with the target odours, variation in the quality of the studies, or the simply greater number of papers in that area (i.e. of 67 studies reviewed: 44 on mammals, eight on reptiles, seven on birds with three of these overlapping with mammal studies, seven on invertebrates, three on plants, one on amphibian) (see Table 2).
CDD efficacy should be evaluated during training and testing rather than waiting until operational searches to assess performance, however, many published studies simply investigate whether CDD can discriminate the target odour in a simple controlled trial and do not progress to testing the CDD in a field environment under operational conditions.Indeed, of the 67 studies examined in this review, Of studies assessing performance in the field, scat surveys of mammals are the most prevalent with precision rates of between 30.8% to 100% (Beckmann, 2006;Clare et al., 2015;Cozzi et al., 2021;Davidson et al., 2014;DeMatteo et al., 2014;Furtado et al., 2008;Harrison, 2006;Hatlauf et al., 2021;Hollerbach et al., 2018;Kretser et al., 2016;Orkin et al., 2016;Sentilles et al., 2021;Smith et al., 2003;Statham et al., 2020;Thompson et al., 2012;Vynne et al., 2011).Low rates of precision may occur as it can be difficult for the handler to accurately identify scats visually which can lead to them accidentally rewarding indications on non-target scats (i.e.false positives) hence reinforcing and leading to a subsequent increase in their frequency.
Additionally, CDD may be correctly alerting and DNA barcoding and Unfortunately, even while assessing the ability of CDD using these set measures, not every study reports results clearly enough to make inferences.Sometimes, the number of targets found is the only measure reported due to budget constraints, being unable to verify true positives in the field (e.g.small mammals hiding or denning in inaccessible places (Thomas et al., 2020)), or simply a lack of information given within the study itself (Arandjelovic et al., 2015;Bearman-Brown et al., 2020;Becker et al., 2017;Brook et al., 2012;Dematteo et al., 2009;Glen et al., 2016;Jean-Marie et al., 2019;Kapfer et al., 2012;Liczner et al., 2021;Long et al., 2007;McGregor et al., 2016;Petroelje et al., 2021;Rolland et al., 2007;Thomas et al., 2020;Wasser et al., 2004Wasser et al., , 2012)).

| Training
Training is the foundation of CDD performance with several stages including imprinting, indication, search tasks and discrimination trials (DeMatteo et al., 2019).Each has the potential to affect efficacy.
In the context of scent detection dogs, imprinting is the process of familiarising the CDD with the target odour (Mosconi et al., 2017).
Given the sensitivity of the canine nose, sample handling during training must be conducted with care (Kokocińska-Kusiak et al., 2021;Lazarowski et al., 2020).Subtle aspects of sample preparation can lead to the dog learning that another odour is paired with the reward rather than the target itself (Guest et al., 2020).Papers often provide only limited information on sample storage and handling so no inference can be made on whether this affected efficacy.Indeed, issues identified regarding sample use include sample contamination with human scent (Arnett, 2006) or other non-target scents (Vynne et al., 2011), poor decontamination procedures like running under hot water rather than sterilisation of sample storage devices, dog saliva touching sample containers (Rutter, Howell, et al., 2021a, 2021b), and urination and/or defecation by dogs during searches (Browne et al., 2015;DeMatteo et al., 2018;Heaton et al., 2008) which poses a threat to samples and ecosystems (Whitehouse-Tedd et al., 2021).
Goss (2019) provides detailed information on the importance of proper sample storage and which materials are and are not appropriate for use as storage devices.Furthermore, a review of detection dog work suggests that over 20% of studies may have used the same samples across training and testing (Johnen et al., 2017) which means the dog may have learnt the specific samples rather than the target odour profile (Stanhope & Sloan, 2019).
Given that CDD are biological systems, their olfactory function is subject to many influences (Kokocińska-Kusiak et al., 2021).Factors linked to reductions in olfaction capability include older age, use of certain pharmaceuticals, diseases, dehydration, diet and nutrition, activity levels and environmental influences like temperature, humidity and precipitation (Gutzwiller, 1990;Hayes et al., 2018;Jenkins et al., 2018;Kokocińska-Kusiak et al., 2021).There is simply no way to know if any internal variables may have played a role in CDD efficacy if details are missing about the dogs used and their care.
Furthermore, the target odour that a CDD has been trained to find can also affect operational search efficacy, as it is unclear whether CDD search for complete odour signatures or simply components of the target odour that are present across samples and conditions (Johnen et al., 2017).Indeed, CDD are very capable of generalising from low scent profiles during training to full specimens in the field and vice versa (Dematteo et al., 2009;Oldenburg et al., 2016;Rutter, Mynott, et al., 2021).However, depending on the samples used to train the dog, different errors may be made in the field.For example, if trained on low concentrations of odour then CDD may alert where no visual sample can be found due to residual scent from past specimen presence, which is an issue that Duggan et al. (2011) faced when searching for Franklin's ground squirrel.Alternatively, smaller samples may be missed more frequently than larger samples as seen with Goodwin et al. (2010) in searches for spotted knapweed.This can occur depending on whether training involved only high odour concentration samples or failed to simulate any aspect of search environments through field tests and discrimination training, meaning the sample can be masked by non-target scents from wildlife or the environment (Gutzwiller, 1990).
Indication or alerting is how a CDD informs a handler that they have found a target through a distinct and consistent change in behaviour (Johnen et al., 2017).Indications can be passive (i.e.no interaction with target) or active (i.e.body contact with target) depending on the needs of a project.Passive indication is recommended for CDD work to protect sample integrity and the safety of both the dog and wildlife (DeMatteo et al., 2019;MacKay et al., 2008;Matthew et al., 2021;Mosconi et al., 2017).However, details and definitions of CDD indications are regularly omitted in the literature.Furthermore, some authors report changes of behaviour (COB; i.e. notable shifts in CDD behaviour that suggest the dog has found or is tracking a scent) or partial indications as a suitable criteria for a true positive (Cablk & Heaton, 2006;Clare et al., 2015;Duggan et al., 2011;Hoyer-Tomiczek et al., 2016;Stevenson et al., 2010) which is far more subjective and open to interpretation and unable to be standardised, thus affecting efficacy rates (Lazarowski et al., 2020).
Several types of search tasks can be used when training and testing CDD efficacy (Helton, 2009).Multiple-choice tasks are where the CDD has the option to investigate multiple containers and is rewarded if they alert on the correct one (Gadbois & Reeve, 2016;Helton, 2009).These can simulate exposure to different scents available in the field and also facilitate discrimination training which is key to ensuring CDD are exposed to commonly encountered scents that should be ignored in favour of the target odour (Arnesen & Rosell, 2021;Bennett, 2015;Boroski & Oliver, 2018;Gadbois & Reeve, 2016;Mosconi et al., 2017;Statham et al., 2020).However, they also provide more sensory interference for the dog and can cause preferences for specific container positions which makes assessing true odour discrimination and indication performance more difficult (Gadbois & Reeve, 2016;Lazarowski et al., 2020).
Alternatively, yes/no or go/no-go tasks involve presenting the dog with a singular sample and rewarding if they make the correct choice in alerting or ignoring (Gadbois & Reeve, 2016;Helton, 2009).These allow for a clear examination of where the dog may be making mistakes and whether they are making choices more liberally (i.e. more false positives) or conservatively (i.e. more false negatives; Gadbois & Reeve, 2016).However, requiring the dog to have greater response inhibition during these tasks can make them needlessly challenging (Lazarowski et al., 2020).Yes/no tasks have been recommended for CDD as they make the calculation of specificity and measures of accuracy comparable (Gadbois & Reeve, 2016;Johnen et al., 2017), but multiple-choice tasks are commonly seen in the literature.Although this method has benefits, it often lacks details on dog performance which can help estimate and explain field efficacy rates.
A vital factor for ensuring efficacy results are reliable is blinding (Elliker et al., 2014).Single blinding is done to ensure the dog is using olfaction rather than memory to find the target, but double blinding is preferred where both the handler and tester also do not know where the target is (Boroski & Oliver, 2018;Johnen et al., 2017;Lazarowski et al., 2020;Stanhope & Sloan, 2019).This avoids the 'Clever Hans effect' which is an example of a horse seemingly being able to count but instead was reading human behaviour to determine when the correct response was given in order to receive a reward (Lazarowski et al., 2020;Sebeok & Rosenthal, 1981).Domesticated animals like dogs are highly skilled at reading human behaviour (Lazarowski et al., 2019), so even in cases where the handler or tester knows the target location and believes that efficacy will be unbiased due to the dog ignoring them for the most part (Browne et al., 2015;Domínguez del Valle et al., 2020;Needs et al., 2021;Vesely, 2008), they may still unconsciously and unintentionally signal the location of the target to the CDD.Indeed, Kardish et al. (2015) found that within ecological, evolutionary and behavioural research, only 13.3% of studies susceptible to observer bias, reported the use of blinding.In our own review we found that 82% of the studies described in Table 2 where blinding could apply (i.e.where training/testing took place; n = 39) used blinding, with 91% of these being double blinding and 9% singleblinding.In other cases, it is either unreported or more worryingly not being conducted at all, though it is a comfort to see the rates of blinding higher than previous studies suggested.

| CDD selection and the handler
Although CDD are used as a tool for detection, unlike analytical devices each individual dog will differ which means the selection criteria of CDD for efficacy is vital.There is little doubt that all dogs with a functioning sense of smell can detect a target that emits odour (Woollett et al., 2013).This has been demonstrated with pet dogs and their owners that have been trained to perform scent discrimination and search tasks for novel odours similar to CDD teams (Rutter, Howell, et al., 2021a, 2021b).However, the breed of CDD is often considered influential in achieving the biological and psychosocial traits necessary for fieldwork.Breeds that have been historically selected for their scent abilities are frequently used under the belief that they will inherently perform well (Lazarowski et al., 2020).However individual differences can affect efficacy (Jamieson et al., 2017).Across CDD literature 128 breeds of dogs have been used and minimal differences found in suitability (Grimm-Seyfarth et al., 2021).Furthermore, the assumption that brachycephalic breeds will perform worse is unverified with pugs outperforming German Shepherds in scent discrimination tests (Hall et al., 2015), although their ability to physically endure under field conditions is untested.
More important than breed-specific differences is individual personality.No standardised measures for conducting personality testing exist and it is unknown when in the dog's life cycle their ability to work can be determined (Beebe et al., 2016).Indeed, wastage (i.e. failing training) is a major problem in breeding for CDD as the dog may be unsuited to conservation work (Byosiere et al., 2019).The essential characteristics for CDD are high play and/or food drive, high hunt drive and low prey drive (Bearman-Brown et al., 2020;Beebe et al., 2016;DeMatteo et al., 2019;Helton, 2009;Jamieson et al., 2017;Smith et al., 2003;Statham et al., 2020;Vynne et al., 2011;Wasser et al., 2004;Willcox et al., 2019).However, most assessments of these traits rely on the subjective view of whoever chooses the dog (Beebe et al., 2016).Moreover, dogs are biological systems and there will always be an amount of variability in performance based on countless internal and external factors throughout their development (Kokocińska-Kusiak et al., 2021;Woollett et al., 2013).
CDDs must work as a team alongside a human handler who oversees searches, verifies finds and reinforces training.As such, the handler also plays a crucial role in CDD outcomes.Similar to dogs, specific skills and traits must be demonstrated to become a handler: ability to direct a search by assessing where the dog has yet to investigate, understanding of animal behaviour, learning and scent theory, attention to detail, consistency and endurance for working in field conditions (Beebe et al., 2016;Boroski & Oliver, 2018;DeMatteo et al., 2019;Helton, 2009).
Handlers can both positively and negatively influence dog performance.The handler's beliefs about how a search will go or the dog itself (Lit et al., 2011), the handler's behaviour during a search regarding possible finds, the handler's level of experience (Jamieson et al., 2018b;Lazarowski et al., 2019Lazarowski et al., , 2020) ) and their personality can all affect the dog's behaviour (Hayes et al., 2018;Jamieson et al., 2018a;MacKay et al., 2008).Furthermore, the bond between a CDD and handler matters for search performance (Bennett, 2015;Mosconi et al., 2017;Otto et al., 2019).Dogs working with an unfamiliar handler, display more stress-related behaviours and have reduced search efficacy, if they will even search at all (Jamieson et al., 2018b;Springer, 2011).

| Search environment and method
Various elements of a search including the area and methods used, also play a role in efficacy.The environment is cited as integral to efficacy variation (Beebe et al., 2016;Bennett, 2015;Kokocińska-Kusiak et al., 2021;Lazarowski et al., 2020;Wasser et al., 2004), but the results of how it can alter CDD performance are mixed (Glen & Veltman, 2018).In some cases, detection rates have been seen to have a positive relationship with wind speed (Mutoro et al., 2021).With vegetation density, a weak negative relationship for CDD, but a strong negative relationship for humans (Domínguez del Valle et al., 2020).
The effect of vegetation density can also be altered by other elements such as temperature, where Grimm-Seyfarth (2022) found a negative relationship between detection probability and temperature when searching in short grass, and a positive relationship when searching in tall grass.There are a few proposed explanations for this, such as how vegetation density can alter scent movement (Gutzwiller, 1990), and how higher temperatures can lead to reduced humidity, increased panting rates for the dog (Osterkamp, 2020), increased direct sunlight (Gutzwiller, 1990;MacKay et al., 2008) and higher amounts of flying insects which may deter the CDD (MacKay et al., 2008) or move the scent plume (Osterkamp, 2020), all of which can reduce detection probability.Furthermore, precipitation can be a concern as it can wash away or degrade samples (Reed et al., 2011).In other cases, no effects of temperature, wind speed, humidity, or vegetation were found across studies looking for a range of targets including mammalian carnivore scats, bat and bird carcasses at windfarms, scat from different species of quoll, Hermann tortoises, cheetah scat and bird carcasses infected with avian botulism (Hofmann et al., 2021;Jean-Marie et al., 2019;Leigh & Dominick, 2015;Long et al., 2007;Mutoro et al., 2021;Paula et al., 2011;Reed et al., 2011;Reynolds et al., 2021;Smith et al., 2005;Thompson et al., 2012).Indeed, it should be noted that not only the environmental conditions themselves, but also how a handler deals with them can have an impact on the search.This includes how the environment affects handler fatigue which in turn impacts the handler's behaviour and body language (Osterkamp, 2020), as well as their ability to keep on-transect while also focusing on the CDD (MacKay et al., 2008).As such, it is clear that environmental effects can be highly variable, and MacKay et al. ( 2008) argue that the question on the effect of environmental conditions on CDD searches needs to be given further attention.
Regarding search methods, elements that differ include searching on or off leash, operational time and effective search distance.In terms of how dogs search alongside handlers, it is recommended that CDD perform off-leash searches to avoid handler bias and allow the dog to move freely and make independent decisions regarding following scent trails (Bennett, 2015;Domínguez del Valle et al., 2020;MacKay et al., 2008).This would mean that those who opt for line search where the dog is leashed may be inadvertently altering efficacy.However, line search must be conducted in some circumstances due to safety concerns for the dog regarding the environment or predators, dense vegetation, or safety for wildlife (DeMatteo et al., 2014;Hansen & Winje, 2021;MacKay et al., 2008).Line searches can also be useful for detailed searches for small odour sources, but not necessary (MacKay et al., 2008;Woollett et al., 2013).Traditionally, operational searches occur in 30-min intervals (Centre for the Protection of National Infrastructure, 2018), but evidence suggests dogs may be able to work continuously for up to 2 h if so trained (Garner et al., 2001).As such, if the dog has been conditioned poorly for TA B L E 3 Checklist of variables that could affect efficacy, and should be included in studies describing training, testing, or operational work of conservation detection dogs (CDD).

Variable Description and questions to answer in study
Training samples These are the samples used to train the dog on the target odour.
• What are they composed of?How many of them were there?How often were they used?
• How were they stored?How were they handled?
Testing samples These are the samples used to test the dog's efficacy.
• Were these different to training samples?If not, why not?
• What samples were used for discrimination?Were they very similar to the target sample and why were they selected?
Other target odours Many CDD are used in multiple studies and therefore may have more than one target odour.
• Have the dogs used been previously trained on other species that could be in that environment?
Odour level This is a description of the concentration of the odour the dog is trained or tested on i.e. parts per million or size and surface area of sample.• Has the dog been trained to find a variety of odour levels, and do those odour levels represent what the dog will be looking for in the field?i.e. dogs that are used to find bats around turbines should be trained to find bat body parts as well as full carcasses as that is what they will be finding in the field.

Indication
The final response trained to show a dog has found something.
• What indication is the dog using?Is it appropriate for the species being detected?

Dog selection
This describes the dogs used for the study and why they were chosen.
• Why were the dogs used for this study selected?
• What are the breeds, personalities, diets, ages and overall health of the dogs selected?
Operational experience of dog and handler team and trainers of the dog team have This describes the experience of the trainer of the dog, and the operational experience of the dog and handler team.
• What experience does the trainers of the dog and the dog team have?Have they worked in conservation before?
• How long has this dog and team been operational?Has the dog had previous finds on other species?Has this handler worked with this dog previously?• Is the handler and dog used to working in this environment and have they been trained to do the length of searches asked of them?

Environment
These are environmental variables that should recorded before and during a search.
• Record the temperature, humidity, wind speed, vegetation density, precipitation before and during the search.

Search
This describes how the search was conducted.
• How was the search conducted?Was it on or off lead?What was the operational search distance and why was this selected?

Efficacy
This describes if the dog is demonstrating the desired effect of their training, that is, to locate specific species or field signs.• Has some measure of sensitivity or precision been measured?If not, why not? • Has there been some measure to show that the dog is demonstrating the desired effect of their training?

NS
Reed et al. (2011) Mountain lion, Bobcat, Domestic cat, Red fox (Vulpes vulpes), Grey fox (Urocyon cinereoargenteus), Kit fox (Vulpes macrotis) indicates additional detail/context regarding methodology and how results were reported in the respective paper study.** indicates further information regarding methods and results reporting if/ when several different techniques were used by the respective paper study.Abbreviations: CDD, conservation detection dogs; F, field work; NS, not stated; N/A, not applicable; TT, training/testing.
37% focus only on training and testing (n = 25), 42% assess solely field performance (n = 28) and 21% look at both (n = 14).Of those studies that measure training and testing performance, 33% conduct their experiments in purely lab-based or controlled field conditions.Moreover, seemingly obvious statistics are sometimes stated such as strong positive correlations between CDD alerts and true positives (Bolton et al., 2021; Oldenburg et al., 2016) which simply means that the dog is doing what it has been trained to do; an unsurprising result given the decades of effective scent detection work performed by canines.This breakdown shows a skew towards laboratory-based and controlled trials that do not translate into assessing fieldwork capabilities or improving methodological practices.Sensitivity and precision rates within fieldwork vary similarly to those of training and testing.Although most operational windfarm mortality searches did not report precision, Paula et al. (2011) achieved rates of 100% meaning all indications were true positives.
profiling of the scat can be wrong due to contamination from nontarget species resulting from coprophagy, urination, and contact with saliva(DeMatteo et al., 2018).Furthermore, both Hollerbach et al. (2018) and Kretser et al. (2016) used CDDs which had also been trained to indicate on other targets as part of previous work.Training CDD to detect multiple species with overlapping habitats can lead to indications on all targets.As such, most of the false positives in these studies were for the previously trained targets which although classified as a false positive in the context of the study, is not a false positive in the context of the dog's training.

•
Is the handler able to identify the dog's indication or change of behaviour and what change of behaviour is being used that equates to a positive indication?BlindingThis describes who is present during training and testing and who knows where the target is during training and testing.•Some form of blinding should be used throughout training and testing.During training single blinding can be performed for certain stages, but should be changed to double blinding for the last stages of training, and for testing.If this is not the case, why not?